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ON THE PERFORMANCE OF FDR CONTROL: 
CONSTRAINTS AND A PARTIAL SOLUTION^ 
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University of Connecticut 

The False Discovery Rate (FDR) paradigm aims to attain certain 
control on Type I errors with relatively high power for multiple hy- 
pothesis testing. The Benjamini-Hochberg (BH) procedure is a well- 
known FDR controlling procedure. Under a random effects model, 
we show that, in general, unlike the FDR, the positive FDR (pFDR) 
of the BH procedure cannot be controlled at an arbitrarily low level 
due to the limited evidence provided by the observations to separate 
false and true nulls. This results in a criticality phenomenon, which 
is characterized by a transition of the procedure's power from be- 
ing positive to asymptotically without any reduction in the pFDR, 
once the target FDR control level is below a positive critical value. 
To address the constraints on the power and pFDR control imposed 
by the criticality phenomenon, we propose a procedure which applies 
BH-type procedures at multiple locations in the domain of p-values. 
Both analysis and simulations show that the proposed procedure can 
attain substantially improved power and pFDR control. 

1. Introduction. Since the original work of Benjamini and Hochberg [2], 
the False Discovery Rate (FDR) paradigm has become an attractive ap- 
proach to multiple hypothesis testing (cf. [1, 2, 5, 6, 7, 8, 9, 10, 12, 15, 16, 17] 
and references therein) . FDR is the expected value of the false discovery pro- 
portion (FDP), that is, the proportion of falsely rejected hypotheses among 
all those rejected if there is at least one rejection, and otherwise. Denot- 
ing by R the total number of rejected nulls, and V that of rejected true 
nulls, FDP = ^ and FDR = S[FDP]. One of the most well-known FDR 
controlling procedures is the Simes procedure [14] adopted by Benjamini 
and Hochberg [2], henceforth referred to as the BH procedure. It is now a 
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classical result that, under certain conditions on the p- values, for any target 
control level a € (0, 1), the BH procedure can attain FDR < a [2, 7, 17]. 

One important issue related to FDR control is power. Let n be the number 
of nulls being tested and N the number of true nulls among them. Then 
power = (7^3^^- A main merit that FDR control is thought to have is its 
relatively high power compared to Familywise Error Rate (FWER) control, 
which is on P{V > 0). Many papers show that incorporating an estimate of 
^ can increase the power of the BH procedure [3, 9, 15, 18]. 

Another issue related to FDR control is positive FDR (pFDR), which is 
the expectation of FDP, conditional on there being at least one rejection: 
-E'l^l-R > 1] [15, 16]. Conceptually, pFDR is important for follow-up studies 
once discoveries are made. However, it is known that for FDR control with 
a fixed rejection region, pFDR — > FDR as n ^ cxo [8, 15]. Therefore, pFDR 
has been practically treated the same as FDR for n S> 1. 

Despite the importance of power and pFDR, there seems to be little work 
on whether there are any constraints on them. We shall take an asymptotic 
approach to this issue, as FDR control is often applied when n, the number 
of tested hypothesis, is large. There are two basic and interrelated questions. 
First, is pFDR always asymptotically the same as FDR? Second, can the 
BH procedure always attain an asymptotically positive power for a target 
FDR control level? 

The relevance of these two questions will be illustrated by several exam- 
ples under the setting of a random effects model. The examples include t- 
tests, F-tests and multiple tests where the false null distribution is a mixture 
of Gaussians with variances smaller than the true null Gaussian distribution. 
As will be seen, in each example there is a critical value a^: > for the target 
FDR control level a giving rise to different regimes of behavior of the BH 
procedure. When < a < a*, the power of the BH procedure decays to at 
the rate Op{l/n). Meanwhile, the pFDR converges to a certain constant /3* 
which is strictly greater than the FDR. As a result, the pFDR and FDR 
are different. When a > a^,, the power of the BH procedure converges to a 
positive constant. However, while the pFDR and FDR are asymptotically 
equal, both are strictly greater than (3^,. Thus, the pFDR is always bounded 
below by /3* , which actually can be quite large if the proportion of false nulls 
is low. In contrast, the FDR can be controlled at any level. This controlla- 
bility, however, has a cost: when a < a^,, the power of the BH procedure is 
of the same order as the power of a FWER controlling procedure. 

Importantly, the above "criticality" phenomenon is not peculiar to the 
BH procedure. Once test statistics are selected, for all the multiple testing 
procedures based on them, there is a common, possibly positive lower bound 
on the pFDR. Pushing the FDR below this bound separates the FDR and 
pFDR, and leads to asymptotically zero power. The bound is "intrinsic" 
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in that it is purely a consequence of the distributional properties of the 
test statistics. How the bound affects power and pFDR control for multiple 
testing in general is studied elsewhere [4]. 

In view of the criticality phenomenon, it is natural to explore ways to 
improve the performance of FDR control. As demonstrated by much work, 
this is possible by appropriately increasing the target FDR control level of 
the BH procedure [3, 9, 15, 18]. In this article we propose a procedure which 
applies BH-type procedures at different locations, or "reference points" in 
the domain of p-values. The idea is to utilize the distributional properties 
of the p-values more effectively, which contain important information that 
distinguishes false nulls from true nulls. When the distribution function of 
the p-values is concave, as in the case for t- and F-tests, this procedure is 
asymptotically the same as the BH procedure. However, in general, as in 
the case involving Gaussian mixtures, the power and pFDR control can be 
improved significantly. Unlike the BH procedure, which generates a single 
random rejection interval containing 0, the multi-reference point procedure 
can generate multiple random rejection intervals. 

The rest of this article is organized as follows. Section 2 collects the main 
theoretical results on the criticality phenomenon of the BH procedure. We 
identify the critical value for the target FDR control level and the lower 
bound for pFDR, and state various asymptotics of the BH procedure. Sec- 
tion 3 considers examples involving multiple t-, F- and z-tests. For the first 
two, the strict concavity of the distribution of p-values will be shown. The 
limitation of the BH procedure for the ^;-tests provides some motivation for 
Section 4, which proposes the aforementioned multi-reference point proce- 
dure. Section 5 reports a numerical study. Section 6 concludes with some 
remarks. Details of the proofs are given in the supplemental material. 

The rest of this section covers preliminaries. Given unadjusted marginal 
p- values pi, . . . ,pn, each one for a different null hypothesis, let < • • • < 
Pn:n be their order statistics. Set pn-.o = and Pn.n+i = 1- Given the target 
FDR control level a G (0, 1), the BH procedure rejects all hypotheses with 
p- values < Pn:Rn , where 

(1.1) Rn = maxjj > 0:pn;j < ^ 

The number of false rejections and power are 

Rn — Vn 



Vn = #{j <n:pj< pn:R„ , the jth null is true}, power„ ^ / 1 • 

(n — iV j V 1 

In the following sections, the p-values are assumed to be sampled from a 
random effects model as follows. Let the population fraction of false nulls 
among all the nulls be a fixed vr £ (0, 1). Let 9j := l{the jth null is false}. 
Then (pi, ^i), (p2) ^2)5 • ■ • are i.i.d., such that Pr{^j = 1} = vr and 

Fi{pj < u\9j = 0}=u, Fi{pj < u\9j = 1} = G{u), u £ [0, 1]. 
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Under this model, FDR = (1 — 7r)a for the BH procedure [2, 7, 17]. 

2. Main theoretical results. This section collects analytical results on the 
criticality phenomenon of the BH procedure, that is, there can be a critical 
value > such that, for a < a* and a > a^, the asymptotic behavior of 
the procedure is categorically different. 

By the random effects model, the common distribution function of the 
p- values is 

(2.1) F{u) = {l-7r)u + 7rG{u). 

We shall always assume that F G C([0, 1]) with -F(O) = 0. Denote 

u 

(2.2) a^ = inf-— <1, = (l _ 7r)a*. 

«>o r [u) 

Figure 1 illustrates the meaning of a*. In particular, 

1 1 - TT 

(2.3) F is strictly concave =^ a^ = "p^, /?* = ;p(^- 

To get an idea why is the critical value, observe i?„ = max{j :pn:j < 
aFn{pn:j)}, where F^ is the empirical distribution of the p- values. For n ^ 1, 
Fn ~ F. Then the largest rejected p- value ~ u*, and Rn/n where 

= maxjti G [0, 1] : u/a < F(u)}, = F{u^) = u*/a. 

To be more precise, in a certain probabilistic sense, 

ti* = lim[Largest rejected p- value], 

(2.4) 

= lim [Proportion of rejected p- values]. 
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From Figure 1, it can be seen that for a < a^, =p^, = 0, which suggests 
that Rn is of order o(n) and its asymptotic behavior should depend only 
on the local properties oi y = u/a and y = F{u) around 0, especially a and 
F'(0). On the other hand, for a > a*, p,,, > and hence i?„ ^ np^. Finally, if 
y = u/a and y = F{u) are only tangent at as in Figure 1(b), then a third 
type of asymptotic behavior arises when a = a* . 

The asymptotic distribution of the BH procedure when a < a^, can be 
characterized as follows. Note that 1 — vr < F'(0) < 

Theorem 2.1. Suppose a £ (0, a*). Let c = aF'{0). Then, as n —> oo, 

Rn -i T := max{j : Sj < 0}, 

the last time of excursion into (— oo,0) by the random walk Sq = 0, Sj = 
Sj~i + 7j — c, j > 1, with 7i, 72, • • • i-i.d. ~ Exp(l) with density e~~^ , x > 0. 
The distribution of r is 

(2.5) Pr{T = A;} = — (l-c)c*^e-*^^ fc = 0,l,.... 

k\ 

Consequently, the power of the BH procedure is of order Op(l/n). Further- 
more, 

(2.6) pFDR^/3* 
and 

oo 

(2.7) d'TY{C{Vn\Rn = /c) , Binomial(A;, )) Pr{i?„ = A;} ^ 0, 

k=l 

where d^yd-iju) := J2k l/^fc ~ ^fcl denotes the total variation distance of two 
distributions /i and v on 0,1,2,..., and £(V^|i?n = ^) is the conditional 
distribution ofVn- 

The limiting distribution of Rn was established for vr = in [7]. The 
characterization of the distribution in terms of an excursion time is new in 
the context of FDR. The implication of Theorem 2.1 is much stronger than 
that in [7] . It shows that when a < a* , the total number of true discoveries 
is bounded, no matter how large n is, even though there is a fixed positive 
fraction of false nulls. Equation (2.6) indicates that the pFDR cannot be 
lower than > 0. Equation (2.7) means that, given a nonzero value of 
Rn, the conditional distribution of Vn approximately follows a Binomial 
distribution. 

Next we consider the case where a^, < a < 1. To establish the asymptotics, 
we assume that all the multiple tests are based on an infinite sequence of 
hypotheses with corresponding p- values pi,p2, ■ ■ ■ , such that i?„ and V„ are 



6 



Z. CHI 



attained from the BH procedure applied to the first n of them. We first 
consider the practicahy important case a > a* . Then < n^, < < 1 . In 
[8] it is shown that the proportion of rejected hypotheses i?ri/n— in 
probability. The law of iterated logarithm (LIL) below characterizes the 
convergence of Rn/n in a stronger sense, namely, almost sure convergence. 

Theorem 2.2. Suppose a G (a*, 1) and A = 1 - aF'{u^) > 0. Let = 
1 — p^. Then 

(2.8j hmsup± = , a.s. 

n \Jn log log n A 

Furthermore, Rn/n is asymptotically proportional to the power: 

(2.9) power = — f — + a] + Op(l) ^ G(u^), a.s. 

n \ TT / 

The condition on A means that the line y = u/a crosses the graph of 
y = F{u) instead of being tangent at (u^,,p^,); see Figure 1. Theorem 2.2 
implies that Rn/n ^ p^ at rate 0(-\/log logn/n). Since now the pFDR is 
asymptotically equal to the FDR, and the latter is equal to (1 — TT)a [2, 7, 17], 
the pFDR is asymptotically strictly greater then If vr < 1, then (2.9) 
implies 

power„ > Rn/n, for n » 1. 

The inequality provides a conservative estimate of the power, which can be 
useful since neither the number of false nulls nor that of rejected false nulls 
is directly observable. 

The main tool to prove Theorem 2.2 is Kiefer's result on Bahadur repre- 
sentation ([13], Section 15.1). Details of the proof are given in the supple- 
mental materials. 

Conceptually, it is of interest to consider the behavior of the BH procedure 
when Q = a*. The next result deals with the case where y = u/a* and y = 
F{u) are tangent at and have no other intersection points; see Figure 1(b). 



Theorem 2.3. Suppose F is twice dijjerentiable at and F'{0)u > F{u) 
for u>0. Then = 1/F'(0). Suppose I = {i > 1 : fW(0) / 0} / 0. Let 
£ = min/. Ifa = a.^, then 

(2.10) hm = vq:=— — -, a.s. 

n^oo log n 2i — l 

The upper hound of Rn can be strengthened to 

V ^ ^ f ^!V2(l + z.o)F-(0)^ ]^(i-o) 

(2.11) limsup r-j << ; — , > , a.s. 

^ ^ n^oo^n^o(logn)i-^« - \ |i^W(0)l i 

Furthermore, a.s., Vn/Rn — > /?* and true discoveries Rn — V„ ~ (1 — /9*)i?„. 
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The last result in Theorem 2.3 shows that choosing a = a* is optimal 
in the sense that as n ^ oo, the pFDR asymptotically obtains the lower 
bound /3=K and at the same time the number of true discoveries is unbounded, 
although growing sublinearly. 

3. Examples. This section considers examples where the criticality phe- 
nomenon occurs. In each example, when a null is true, the corresponding 
test statistic X with density Voi otherwise, X ~ ^'i with density '01 ■ 
Under the random effects model, X ~ = (1 — 7r)^'o + vr^i. We shall focus 
on the right-tail p- value 1 — "^oiX). From probability theory, X ~ 
where [/ ~ Uniform(0, 1). Therefore, the p- value has the same distribution 
as 1 — ^o(^~^(f^))) which has distribution function 



F{u) = 1 - ^(^-o ^(1 - u)) = (1 - 7r)ii + tt[1 - ^'i(^'o ^(1 - u))]. 

Comparing with (2.1), we get G{u) = 1 — ^'i o ^(1 — u). The density of 
the p- value is 

(3.1) F'{u) = l-7r + Trp{x), with x = ^(1 - n), p(x) = ^^^44- 



From Section 2, a necessary and sufhcient condition for the criticality 
phenomenon to occur is a* > 0. It can be seen that this is equivalent to 



F'(0) < oo (cf. Figure 1). By (3.1), F'(0) = 1 - 7r-F7rlim^._oo p(2;). Therefore, 



3.1. Multiple t-tests. Consider normal distributions N{fii,ai), i>l, with 
Pi and C7j being unknown. Suppose that each null is Hi : pi = 0, and when 
it is false, pi = c> 0. Assume that all cTj are equal but this information is 
unknown to the investigator. To test Hi, let Yi^i, . . . , li,jy+i i-i-d. ~ N{pi,ai) 
be collected. If pi = 0, then the t-statistic of the sample follows the t- 
distribution with v degrees of freedom (d.f.); whereas if pi = c, it follows 
the noncentral t-distribution with v d.f. and noncentrality parameter 6 = 
\/u + Ic/a, the density of which is 





l-7r-h7rlima._ooP(a:)' 
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Observe tufi{x) =tu{x), the density of the i-distribution with d.f. We 
have 




The criticahty phenomenon for the t-tests foUows from the next result and 
(3.2). 

Proposition 3.1. If6>0, then 

,3.e, ,^^M.)^.--|r(^)<:f)!/r(i^)<». 

Furthermore, the distribution function of the p-values is strictly concave. 

Proof. It is not hard to see (3.6) holds. For the second statement, by 
(3.4) it suffices to show that p{x) is strictly increasing. It is not hard to see 
this is the case for x > 0. To finish the proof, denote a = ^/26 and let 

oo 

h,{s) = Y,T 

k=0 

Then for x < 0, p{x) = hy{(f){x)) / C , with 

Because (j) is strictly decreasing for x < 0, in order to show that p(x) 
is strictly increasing on (— oo,0), it suffices to show that h^{s) is strictly 
decreasing on (0, 1). Since 4> : (— oo, 0) (0, 1) is one-to-one and onto, for any 
s G (0, 1), hy{s) = Cp{4>~'^{s)) > 0. It is easy to see that h'^{s) = — a/i^+i(s). 
By the same argument for h^, /i^_|_i(s) > for all s G (0, 1). Then h'^{s) < 
and hence, /i^ is strictly decreasing on (0, 1). □ 

3.2. Multiple F-tests. Consider regression models Y = f3f^ + Cj, i > 1, 
where /3j is p-dimensional and ej ~ A^(0,cjj) is independent of X. Suppose 
that for each i, the null is Hi: fi^ = 0, and when it is false, /3j = /3 7^ 0, with 
(3 being unknown. Assume that all ai = l, but this information is unknown 
to the investigator. To test Hi, let an independent sample (li^i,xi), 
(li^^+p, Xj,_i_p) be collected, where xi , . . . , x^y+p are fixed covariates for all i. 
If Pi = 0, then the -F-statistic of the sample follows the F-distribution with 
{PjI') d.f.; otherwise, the statistic follows the noncentral i^'-distribution with 



for. G [0,1]. 
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Monotonicity lirrix ^ ao pk{x) 



(Tfc = 1, fik > strictly increasing oo 

CTfe = 1, fik < strictly decreasing 

CTfc = 1, ^fc = 1 1 

CTi. < 1 maximized at — 

k 

(Tfc > 1 minimized at oo 



(p, z^) d.f. and noncentrality parameter 5 = (/3"^xi)^ + • • • + (/J^^x^y+p)^, the 
density of which is 



^^k]Bip/2 + k,iy/2)\l + 6x, 

x>0, 

where 9 = p/u and B{a, h) = T{a)T{h) /V[a + b) is the Beta function. Observe 
that fp^u,o{x) = fp^uix), the density of the usual F-distribution with (p,!^) 
d.f. For X > 0, 

(37) p(x) - W^) _,-./2^fP u\f^^S/2y^(^\' 
^ fpA^) ''\2'2j ir'klB{p/2 + k,u/2)[l + 9xJ ' 



which is strictly increasing, and 



(3.8) hm p{x) = e-'/'B (^,-)T ,, , ^^^^^ , , < oo. 

^ ' x^oc^"- ^ \2' 2J ^^k\B{p/2 + k,v/2) 

Therefore, the criticality phenomenon occurs with the F-tests. Note that by 
Stirling's formula, B{a + k,h) ^ T{b)k~^ as /c — > oo, hence the convergence 
in (3.8). 

3.3. Multiple z-tests. Suppose that the distribution under a true null 
is ^0 = -^(0, 1) and we have complete knowledge about this. The distri- 
bution ^1 under a false null on the other hand is a mixture of Gaussians 

piN{fxi,ai)-\ \-pmN{firn,crm), with pi H hpm = 1 and 0<pk<l. The 

likelihood ratio function in this case is 

i^iix) ^pk ^Pk r {x - i^k)'^ - alx"^ \ 
p{x) = —TT = 2^ —Pk[x) = 2^ — exp<^ '>. 

All the possibilities for each pk{x) are given in Table 1. 

Therefore, if all the components of the Gaussian mixture have variances 
less than 1, from (3.2), no matter what the signs of their means are, the 
criticality phenomenon occurs. 
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3.4. Tests involving mixtures of shift-scaled densities. The example in 
Section 3.3 is a special case where the distribution under the false null is a 
mixture of shift-scaled versions of the distribution under the true null: 

m m 

(3.9) ^'i(x) = ^pfc^o(sfc2; - tpi{x) = ^pkSktpo{skX-tk). 

k=l k=l 

In most of the practical cases, the null density ipQ satisfies the tail condition 

ipoisx — t) 

(3.10) limsup — < oo, any s > 1 and t. 

If the right tail of ipo is rapidly decaying in the sense that 

ipo(sx — t) ^ ^ , 

lim r~r^ — = L); s > 1 and t, 

x^oo ^po{x) 

then the criticality phenomenon occurs when Sfc < 1 for all the components 
in the mixture (3.9). This is the case for Gaussian mixtures. On the other 
hand, if the right tail of ipo is slowly decaying in the sense that 

lim inf '^^^ — — — — ^ = a(s, t) > 0, any s > 1 and t, 

x^oo ipolX) 

then it is seen that (3.10) holds for any s > and t. As a result, limsup^.^oo p{x) < 
cxo and the criticality phenomenon occurs. 

As an example, the density of the Cauchy distribution with scale s is 
Hsix) = suj[i+\x/sy^] ' "^hsre u! denotes the circumference-diameter ratio of a 
circle. Suppose X ~ /^i under a true null and X ^ fig with s ^ 1 under a false 
null. Because /is(x) =s~^fii{x/s) and the tail of ^1(2;) is slowly decaying, 
the criticality phenomenon occurs with the BH procedure when it is applied 
to multiple tests on the scales of the distributions. 

4. A multi-reference point procedure. Under the random effects model, 
it is shown in [4] that the infimum of the pFDR attainable by multiple 
testing is/3= (1 — vr)/ supF'. In general, supF' < sup-^^ and hence, /3 
the infimum of the pFDR attainable by the BH procedure. This raises the 
question as to how to modify the BH procedure in order to improve its 
pFDR control. 

To get some idea, consider the example of a Gaussian mixture in Sec- 
tion 3.3. If every component of ^'i has a smaller variance than \I'0) then 
most of the p-values from false nulls cannot be very small. On the other 
hand, if the target FDR control level is small enough, the BH procedure will 
only reject small p-values, and hence, overlook most of the p-values from 
false nulls, causing a loss in power and capability to control the pFDR. 

The example suggests that in order to improve power and pFDR control, 
an FDR controlling procedure should look at not only small values, but 
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also moderate or even large ones for candidates of rejection. Is this reason- 
able? To answer the question, consider how the p-value is defined. First, 
one has to choose what amounts to "unusualness" of a single observation, 
for example, a large difference from 0. Then the p-value is defined in terms 
of this pre-specified unusualness under a true null. There is no guarantee 
that the p-value accounts for what is actually unusual about a population 
of observations from false nulls. For the Gaussian mixture example, is 
unusual exactly because it generates too many "usual" observations. Reject- 
ing small p- values overlooks this. However, unlike single hypothesis testing 
in multiple hypothesis testing the unusualness of a population of p-values 
may be detectable. In the above example, the histogram of p- values exhibits 
several peaks due to different components of ^'i. If the "attention" of an 
FDR controlling procedure can be distributed to the peaks, then it may 
have improved power and pFDR control. 

4.1. Description. The proposed procedure will be referred to as "Pro- 
cedure M," as it combines BH-type procedures at multiple locations or 
"reference points." Given t € [0,1], denote Rn{t) = #{i <n:pj < t} and 
Rnit) = Rn{t-) = #{i <n:pj < t}. Then the "forward" BH-type procedure 
at t rejects nulls with p- values in [t,Pn:Un{t)]j while the "backward" BH-type 
procedure rejects nulls with p- values in [Pn:L„{t)iO) where 

Ln{t) = R"M -r~{t) + l, Unit) = Rnit) + r+(t), 

with 

r+(t) = maxjO < j<n- Rnit) -Pn: R„{t)+j - * < — |, 
r„(t) =max|0<i <i?°(t):t-p„,,ij,o(j)„j+i < — |- 

The total number of rejections at t is then r„(t) = r+(t) + r~(t) = Unit) — 
Lnit) + l. 

Given reference points = to < < ' " ' < = 1) Procedure M is as fol- 
lows: 



Step 1. Select < tj^ < • • • < such that: 

(1) Tnitif,) > (logn)'^ for each k = 1, . . . ,m, where c > 1 is a parameter; 

(2) [LnitiJ,Unitif^)] are disjoint, that is, Uniti^) < L„(tjj^^J for = 1, . . . , m; 
and 

(3) X^fcLo '''niti^, ) is the largest among all subsets of t's satisfying (1) and (2). 
Step 2. Reject al\pj falling into the union of J(ti J := [Pn:L^{u )^Pn:Ur^{u )]■ 
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Fig. 2. Rejections at different reference points. The parallel thin lines passing through 
(0,0), {t,F{t)) and (1,1) all have slope 1/a. 



Condition (1) in Step 1 requires that every selected reference point should 
have at least (logn)*^ rejections. As for the BH procedure, the BH-type 
procedures at each t may have a critical value a*(t) > such that, for a < 
CK*(i)5 fn{t)/{lognY for any c > 1 and for a > a*(t), r„(t) is of order 
n'^ for some a > 0. Condition (1) is used in the case a < a*(t). 

The optimization in Step 1 can be efficiently computed by dynamic pro- 
gramming; see the Appendix. In general, Procedure M generates several 
random rejection intervals, each one a connected component of the union of 
J{tif,) associated with selected tj^.. The BH procedure, on the other hand, 
always generates a single random rejection interval. 

4.2. Some justifications. To see roughly why Procedure M can increase 
power and improve pFDR control, first consider Figure 2(a). In this case, by 
Theorem 2.1, the BH procedure asymptotically has power 0. On the other 
hand, for the t in Figure 2(a), a > 1/F'{t). By Proposition 4.1, 

Pn:L„{t) ^ := inf{x G [0, t] : t - X < a{F{t) - F(x))}, 

(4.1) 

Pn:U„{t) uf := sup{x G [t, 1] : X - t < a{F{x) - F{t))}. 

Similar to the BH procedure, among the nulls with values between Pn:L„{t) 
and Pn:U„{t)i expected proportion of true nulls is less than a. Therefore, 
the BH-type procedure at t has positive power. Likewise, the backward BH- 
type procedure at t = 1 has positive power. Procedure M thus controls the 
FDR at a with positive power. 

Now imagine that pj = \ — pj instead of pj are used for the tests. Then 
the distribution function of the p- values becomes F{x) = 1 — F{1 — x). Since 
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F'(0) = F'{1) > 1/a, by Theorem 2.2 the BH procedure has positive power. 
However, by combining the BH-type procedm'e at I — t, the power is in- 
creased by a positive factor. 

To see why Procedure M requires selection of ti- , note that [Pn:L„{t) > Pn:Un{t)] 
may overlap for different t. In Figure 2(b), since F is convex, for t G {u^ , 1), 
u~[ =t and = 1, whereas for t < uj~, u~[ = uf = t. Therefore, for ?i » 1, 
only = 1 is selected. 

As an example, consider the tests on Cauchy distributions in Section 3.4 
again. If the p- value of each observation X is defined as the two-tailed prob- 
ability Pr{|VF| > where W ^ ni is independent of X, then it has dis- 
tribution function 

2-K _i / 1 OJU 

i^2-tailW = (1 - vrjiiH cot -cot — 

00 \s z 

withF^.,^il(0) = l-^ + 7rs. 

The critical value is a* = j— j^q-^. Let a G (a*, 1). As F2-taii is strictly con- 
cave, the BH procedure and Procedure M are asymptotically the same, both 
rejecting p-values in [0,ii*], with G (0, 1) the solution to u/a = -F2-taii('w)- 
On the other hand, if the p-value is defined as the left-tail probability 
Pr{M/^ < X}, then its distribution function is 

F .„N_/l^2-tail(2n), Q<u<\, 
i^lcftW-|^_i^^_^^^^^2-2n), i<n<l. 

Asymptotically, the BH procedure only rejects p- values in [0, u^], while Pro- 
cedure M rejects those in [O,!*,]"] U [1 — u^, 1], with Uq G (0, 1) the solution 
to u/a = Fieft(u) and G (0,1) the solution to (1 — u)/a = 1 — Ficft(n). 
It is seen that Uq = \ — Ui = Thus, the power of the BH procedure 

is reduced by half, while that of Procedure M is unchanged. Procedure M 
therefore is less sensitive to the choice of p- values. 

4.3. Theoretical properties. The following characterizations of L„ and 
Un by continuous stopping times and fixed points generalize those in [17] 
and [8]. 

Proposition 4.1. L„(t) = ii°(r-(t)) + l andUn{t) = Rn{T+{t)), where 
T-(t)=mfL<.:i-:^<[^"(^)-f(-)]"H, 

(4.2) 

FJ \t) = sup< x>t: < > . 

La n ) 

Therefore, Step 2 of Procedure M is the same as rejecting all nulls with 
p-values in [r^(tij.),T+(tjj.)], k = 0, . . . ,m. Furthermore, T^{t) —>■ uf and 
(4.1) holds. 
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The results on power and pFDR control in Section 2 for the BH procedure 
can be generalized to Procedure M. Let pt{t) = F{uf) — F{t), p^{t) = F{t) — 

Theorem 4.1. Suppose ti — aF{ti) are different from each other: 

(1) If p^{ti) = and F'{ti) <l/a for all i, then a.s., for n^\, Rn = 0. 
Furthermore, 

(rt {0),r~{ti),r^{ti),...,r- (t a/-i ) , r+ (t a/ -i ) , r ~ (1) ) 

{T0,fi,Ti, . . . ,fM-l,TM-l,fM), as CO, 

where the t^s and f's are independent, each and fk following the distri- 
bution of the last excursion time into (— oo,0) of the random walk Sq = 0, 
Sk = Sk-i+'jk- aF'{tk), TOi/i 71,72,... i-i.d. ~Exp(l). 

(2) If pt (ti) + P^ iU) > for some i = 0, M , then a.s., 

limsupFDR = limsuppFDR < {1 — 7r)a, 



lim power^ = ( h a ) H, 



vr 



where 



.,tM}:[u^ ,ut] [^^g 



n= max , {z^[p*{s)+pj{s)] 

Sc{to, - 

are disjoint for sgS 

(3) Suppose for each i, h := {k > l:F^''\ti) / 0} / 0. Let ii = minlj. // 
pti^i) = for all i but F'[ti) = 1/a for at least one of them, then 

\ogRn as 2^-2 

lim FDR = lim pFDR = (1 - vr a, ^ ^ — , 

n^oo n^oo log n 2£ — 1 

where i = maxj^j : F'(tj) = 1/a}. Additionally, a.s., for 1, the set of 
rejected p-values consists exactly of those in [T^(tj),T+(ti)] with F'{ti) = 
1/a. 

Note that, unlike the BH procedure, if Procedure M cannot control the 
pFDR, then for n ^ 1 it will make no rejections. This way of signaling the 
controllability of the pFDR can also be used to control other types of error 
rates for multiple testing [4]. 

5. Numerical study. In this section we report simulation studies on the 
criticality phenomenon of the BH procedure and Procedure M. All the simu- 
lations are conducted using R [11]. In the simulations, ^3- values are sampled 
as follows: 
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Table 2 

(A) simulations for the BH procedure with a = 0.3 < a, . t^^s is the noncentral 
t- distribution with v d.f. and parameter 5. Fm,n.s is the noncentral F -distribution with 

{m,n) d.f. and parameter S > 0. FDR and pFDR are the MC estimates of 
FDR = (1 — 7r)a and pFDR = (1 — 7r)a* . (B) Estimated pk = Pr{r — k} from the first 







simulation 


m (A) 


vs. Pk — Pr{ 


T = k} based on 


(2.5) 




(A) 


Simu. 




*i 


77 


a* 


FDR 


FDR 


pFDR 


pFDR 


1 


tio 


^10,1 


0.05 


0.512 


0.272 


0.285 


0.532 


0.486 


2 


tio 


iio.i 


0.02 


0.724 


0.297 


0.294 


0.749 


0.709 


3 


-Fio.io 


-Fl0,10,3 


0.05 


0.892 


0.294 


0.285 


0.867 


0.847 


4 


Fn],io 


-Fl0,10,3 


0.02 


0.954 


0.305 


0.294 


0.941 


0.935 


(B) 


k 


pk 


Pk 


k 


pk 


Pk 


k 


pk 


Pk 





0.4900 


0.4140 


8 


0.0176 


0.0221 


16 


0.0022 


0.0060 


1 


0.1474 


0.1350 


9 


0.0098 


0.0185 


17 


0.0022 


0.0051 


2 


0.0908 


0.0881 


10 


0.0094 


0.0155 


18 


0.0016 


0.0044 


3 


0.0604 


0.0646 


11 


0.0066 


0.0131 


19 


0.0004 


0.0038 


4 


0.0474 


0.0500 


12 


0.0068 


0.0112 


20 


0.0008 


0.0033 


5 


0.0410 


0.0400 


13 


0.0042 


0.0095 


21 


0.0010 


0.0029 


6 


0.0296 


0.0323 


14 


0.0038 


0.0081 


22 


0.0004 


0.0025 


7 


0.0198 


0.0266 


15 


0.0038 


0.0070 


23 


0.0010 


0.0022 



1. Sample 9 ~ Bernoulli(7r), where tt is the population fraction of false nulls. 

2. If 9 = 0, sample p ~ Unif(0, 1) and return p. 

3. If 9 = 1, sample X ~ ^'i and return p = 1 — ^0(^)5 where ^0 is the 
distribution under a true null, and ^1 the distribution under a false null. 

5.1. Simulation study on the BH procedure with a < a^,. This part of the 
study consists of four simulations on t- and F-tests. Since the distributions 
of p- values are strictly concave, we apply (3.4), (3.6) and (3.8) to compute 
the critical value a^=. The parameters of the null distributions and values of 
a* are shown in Table 2(A), columns 2-5. The Appendix has some remarks 
on the evaluation of a*. 

In the simulations, a = 0.3. Each simulation contains 5000 runs. As the 
distribution of i?„ appears to converge slowly, in each run the BH procedure 
is applied to n = 10^ sample p-values. The FDR and pFDR are estimated 
by the Monte Carlo (MC) average 

1 5000 1 5000 -, r ^ nl 

1 Vi — — — 1 v-^ i^o lj r.Y > h 
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where rj and vj are the numbers of rejections and false rejections in the jth 
run, respectively, and the number of runs with rj > 0. From the last four 
columns of Table 2(A), we see good agreement between the simulations and 
the theoretical results that pFDR ^ (1 — 7r)Q;* and FDR = (1 — 7r)a. 

We next compare the limiting distribution of R in (2.5) and its estimate 

^ 5000 

Table 2(B) collects the results of the first simulation, which has ^'o = tio, 
^1 = iio,i and vr = 0.05. For relatively small values of k, pk agrees with 
Pk = Pr{T = k} reasonably well. The Kullback-Leibler (KL) distance be- 
tween p and p, D{p\\p) = J2iZoPk^oS^, is equal to 0.0409. On the other 
hand, the total variation (TV) distance dTv(p,p) = 0.1847, which is the 
largest among the four simulations. The KL and TV distances from the other 
three simulations are (0.0042,0.0425), (0.0018,0.0220) and (0.0020,0.0306), 
respectively, indicating that the smaller a/a* is, the faster p converges to p. 

5.2. Simulation study on the BH procedure with a > a^: . This part of the 
study consists of simulations on t-, F- and 2-tests. The parameters of the 
simulations are shown in Table 3(A), columns 2-4. 

In each simulation, a = 0.25, the number of runs is 400, and each run 
samples n = 40000 values. The MC estimates of pFDR and the theoretical 
values of FDR are shown in columns 5-6, which agree with each other very 
well. In all the simulations, FDR is identical to pFDR. 

We next numerically evaluate the LIL in Theorem 2.2. In principle, this 
requires that in each run we sample an infinite sequence of p- values, apply the 
BH procedure to the first n = 1, 2, ... of them, and then check lim sup„ =bL„ = 
A, where we denote 

^5 -j^^j ^ ^ Rn - np^ A= ~^*) 

\/ n log log n ' 1 — aF'(ii*) 

However, this is problematic because (1) the convergence is very slow, and 
(2) applying the BH procedure for every n is computationally too costly. We 
instead take the following approach. In each run, sample N = 320^ = 102400 
p- values. Set = [(200 + Q.2kf\ . Then, for /c = 0, 1, . . . , 600, apply the BH 
procedure to the first Uk p- values to get . Finally, the output of the run 
is maxfc \Ln^\. 

Due to the slow convergence of the LIL, we do not expect to observe 
convergence of max^ I to A in the simulations. However, if we can show 
that, for most of the runs, max^ \ Ln^ \ < A, then it empirically demonstrates 
Rn = np* + Op (^log log n/n). 
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The results of the simulations are summarized in Table 3(B). The values 
of p^, F'{u^) and A involved in the LIL (5.1) are shown as well. Note that 

is also the asymptotic limit of the proportion of rejected p-values; see 
(2.4). The Appendix has some comments on its numerical evaluation. For 
t > 1, let Q{t) be the proportion of runs with max^ \Lny, \ > t\. It is seen 
that, for most of the runs, max^ \ Ln^\ < A. Figure 3 plots all the curves of 
L„/A versus \/ n log log in the 400 runs of simulation 5. Again, it shows that, 
for most of the time, L^/X stays in [—1,1]. The plots from the other five 
simulations show the same property. 

5.3. Simulation study of Procedure M. This part consists of six simula- 
tions to compare the BH procedure and Procedure M. In Section 4 it is noted 
that for Procedure M, it is a useful contraint for pFDR control that each 
selected reference point generate at least (logn)"^ rejections. To examine this, 
we test Procedure M for c = 1.5, 2, and also its modified version which only 
requires that each selected reference point generate at least one rejection. 
These three procedures are denoted M(1.5), M(2) and M(0), respectively. 

Table 3 

(A) Parameters in the simulations for the BH procedure with a = 0.25 > a* . The MC 
estimate pFDR is obtained for the total number of p-values per run n = 40000. 

(B) Simulation results on the LIL for the simulations in (A). Q{t) is the proportion of 

runs with maxfe |L„^ | > t\ 



(A) 



Simu. 


*o 


*i 




77 


a* 


pFDR 


FDR 


1 




^20, 2 




0.2 


1.5 X 10"^ 


0.2005 


0.2 


2 


t20 


^20, 2 




0.15 


2 X 10~^ 


0.2122 


0.2125 


3 


^20, 30 


-F20,30,30 




0.2 


6 X 10"^ 


0.2001 


0.2 


4 


^20, 30 


^20,30,30 




0.15 


8 X 10"* 


0.2122 


0.2125 


5 


iV(0,l) 


A^(2,l) 




0.2 





0.2002 


0.2 


6 


A^(0,1) 


N{2,1) 




0.15 





0.2127 


0.2125 


(B) 


Simu. 


P* 


F'(u.) 


A 


Q(i) 


Q(l.l) 


Q(1.2) 


Q(1.5) 


1 


0.1330 


1.970 


0.946 


0.105 


0.065 


0.043 


0.005 


2 


0.0855 


2.123 


0.843 


0.113 


0.060 


0.040 


0.005 


3 


0.1876 


1.513 


0.888 


0.125 


0.068 


0.025 





4 


0.1313 


1.649 


0.813 


0.155 


0.108 


0.065 


0.015 


5 


0.1453 


1.781 


0.898 


0.125 


0.073 


0.040 


0.008 


6 


0.0974 


1.896 


0.797 


0.1 


0.053 


0.038 


0.008 
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Each simulation contains 1000 runs. In each run, the BH procedure and 
M(c), c = 0, 1.5, 2, are applied to the same set of 20000 sample p-values. 
The distribution \I'o when a null is true is A'^(0, 1), the target FDR control 
level is a = 0.3, and the reference points for M(c) are O.Ol/c, < fc < 100. 
The other parameters for the simulations and the critical values a.^ for the 
BH procedure are shown in Table 4(A), columns 2-4. 

The results are summarized in Table 4(B). In simulation 1, because a 
is only about a third of a*, the BH procedure has little power and pFDR 
~ 1. On the other hand, sup-F' = 4.83, where F' is the density of the p- 
values; see Table 4(A). From (4.1) and Theorem 4.1, if Procedure M has 
reference points close to where F' is maximized, then it can attain pFDR ~ 
(1 — 7r)a = 0.285 with positive power. This is the case for M(1.5) and M(2). In 
contrast, M(0) has substantially higher pFDR despite a little more power. 
This demonstrates that by introducing some constraint on the minimum 
number of rejections at each selected reference point. Procedure M can attain 
pFDR control with positive power even when the BH procedure cannot. 

In simulation 2, since supF' = 2.53, a < 1/F'{t) for every t. From Theo- 
rem 4.1, Procedure M cannot attain pFDR = 0.3. M(1.5) signals this with 

small P{Rn > 0) w 0.03. M(2) signals this by making no rejections. 

On the other hand, the BH procedure and M(0) fail to signal the fact that 
pFDR = 0.3 is not attainable. 




350 



400 



450 



500 



V'tZ log log 71 

Fig. 3. Plots of L^/X vs. log log n for all the runs in simulation 5 in Table 3(A). 
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In simulations 3-6, the distribution has a component with SD greater 
than that of ^q. By Section 3.3, the BH procedure can attain any level of 
pFDR with positive power. However, in simulations 3-4, is extremely 
small; see Table 4(A). Recall that is the limit of the largest p- value re- 
jected by the BH procedure; see (2.4). Consequently, with overwhelming 
chance, the BH procedure only rejects values from ^'o, and hence, has 
little power. In contrast, M(1.5) can attain pFDR~ (1 — Tr)a in both simu- 
lations 3 and 4, while M(2) has substantially worse performance than M(1.5) 
in simulation 4, indicating that its constraint on the minimum number of 
rejections per selected reference point is too hard when vr is small. 

Finally, in simulations 5-6, is large enough for the BH procedure to 
attain pFDR control with moderate power. In simulation 5, both M(1.5) and 
M(2) can attain pFDR control with almost 3 times more power than the BH 



Table 4 

(A) Parameters in the simulations for the BH procedure vs. Procedure M with a = 0.3. 
F' IS the density of the upper-tail p-value under 9^ — N{0, 1) for X ~ (1 — 7r)^o + tt^'i. 
Ut is the limit of the largest p-value rejected by the BH procedure. (B) Simulation results 



(A) 


Simu. 






77 


a* 


supF' 




1 




i7V(-1.3, 0.015) + iA^(l, 0.015) 


0.05 


0.9107 


4.8307 





2 




Same as 1 


0.02 


0.9623 


2.5323 





3 


fiV(- 


1.3,0.015) + fiV(l, 0.015) + iiV(-4,2) 


0.05 





oo 


1.4 X 10~^ 


4 


Same as 3 


0.02 





oo 


2.9 X 10"^" 


5 


fiV(- 


-1.3,0.015) + fiV(l, 0.015) + iiV(4,2) 


0.05 





oo 


0.01 


6 




Same as 5 


0.02 





oo 


0.0039 



(B) 

Simu. Type FDR pFDR Power Simu. Type FDR pFDR Power 



1 


BH 


0.29 


1 





2 


BH 


0.296 


1 







M(0) 


0.3541 


0.3541 


0.7647 




M(0) 


0.9559 


0.9559 


0.0145 




M(1.5) 


0.2882 


0.2882 


0.7643 




M(1.5) 


0.0145 


0.4823 


0.0015 




M(2) 


0.2882 


0.2882 


0.7641 




M(2) 





NA 





3 


BH 


0.2907 


0.9954 


2.9 X 10"*^ 


4 


BH 


0.315 


1 







M(0) 


0.3835 


0.3835 


0.5498 




M(0) 


0.7103 


0.7103 


0.1459 




M(1.5) 


0.2924 


0.2924 


0.5471 




M(1.5) 


0.2912 


0.2913 


0.1365 




M(2) 


0.2917 


0.2917 


0.5356 




M(2) 


0.0242 


0.3565 


0.0112 


5 


BH 


0.2821 


0.2821 


0.1475 


6 


BH 


0.2911 


0.2911 


0.1360 




M(0) 


0.3835 


0.3835 


0.5432 




M(0) 


0.7100 


0.7100 


0.1454 




M(1.5) 


0.2923 


0.2923 


0.5404 




M(1.5) 


0.2913 


0.2913 


0.1362 




M(2) 


0.2914 


0.2914 


0.5283 




M(2) 


0.0217 


0.3559 


0.0101 
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Fig. 4. Simulation 5 for the BH procedure andM(c), c = 0,1.5,2. Top: F'(t) vs. Pcovcr(t). 
Bottom: enlarged views of the functions around the three modes. 
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Fig. 5. Simulation 6 for the BH procedure andM(c), c = 0,1.5,2. Top: F'{t) vs. Pcovcr(i). 
Bottom: enlarged views of the functions around the three modes. 
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procedure. However, in simulation 6, while M(1.5) has similar performance 
to that of the BH procedure, M(2) is significantly worse. 

To identify the sources of the differences among the procedures, one way 
is to look at what values are likely to be rejected. We consider Pcoverit), 
the probability that t G [0, 1] is covered by a rejection region, which can be 
estimated by 

1000 

l{t < largest rejected p-value}, for BH, 

1=1 

1000 

l{t G J{s) for some selected reference point s}, 



-fcovcr (t) 



for M(c), 

where J{s) is defined as in Procedure M, Step 2. 

Figures 4 and 5 plot -Pcover(*) as well as the density of p- values F'{t) for 
simulations 5 and 6, respectively. The density has three modes in each case. 
In simulation 5, for the BH procedure, -Pcover(*) has a single peak aligned 
with the mode at 0, and for M(1.5), Pcovait) has three peaks aligned with 
the three modes, which explains why it has significantly more power than 
the BH procedure. The shape of Pcovcr(i) for M(2) is very close to that 
for M(1.5). In contrast, for M(0), -Pcovcr(i) has many peaks located at the 
reference points in addition to the three major peaks. As a result, M(0) 
rejects too many ^3- values that are highly likely to be associated with true 
nulls, which substantially decreases its capability to control the pFDR. For 
all the procedures, -Pcover(i) is almost identical around 0. 

The situation is different in simulation 6. Because the values of F' at the 
nonzero modes are less than 3, neither M(1.5) nor M(2) makes rejections 
around them. Around 0, the BH procedure, M(0) and M(1.5) yield almost 
identical Pcovcrit), whereas M(2) yields one well below the others. The BH 
procedure and M(1.5) therefore have similar power and pFDR control, and 
notably outperform the other two. 

6. Discussion. Exploratory studies aim to identify real novel signals from 
a large number of observations. For any procedure used to detect the signals, 
its performance is constrained by the amount of useful information in the 
data. The BH procedure is no exception. In this article we demonstrate a 
criticality phenomenon that imposes constraints on the procedure's power 
and pFDR control. Roughly speaking, the criticality is not only due to the 
bounded likelihood ratios, or "signal- noise ratios" of the data [see equa- 
tion (3.3)], but also may be due to the procedure missing important clues 
that separate false nulls from true ones. The whereabouts of these clues, by 
the nature of exploratory study, is unknown a priori. Based on this perspec- 
tive, we propose a multi-reference point procedure which conducts testing 
at multiple locations in the domain of p-values in order to catch useful clues. 
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Many questions remain to be answered on what constraints multiple test- 
ing may have and how to tackle them. First, although the random effects 
model adapted here is helpful in identifying some of the constraints, how 
dependence among observations may affect power and pFDR control is yet 
to be seen. Second, how to detect the performance bound for a procedure. 
From Section 2, to estimate a lower bound for the pFDR attainable by the 
BH procedure, one possible way is to utilize an estimated distribution func- 
tion of the p- values. Alternatively, different subsamples of the p- values may 
be tested at a given target control level. The distribution of the number of 
rejections then could give some clue as to whether the target control level 
is below the infimum of the attainable pFDR. 

The multi-reference point procedure has much room for improvement. 
In essence, the procedure itself consists of multiple tests, one per reference 
point. Therefore, some regulations are needed for the procedure, otherwise 
it may have the same types of problems it is intended to address. One issue 
is how to determine the minimum number of rejections for each selected 
reference point. Second, when the number of tested hypotheses increases, 
it is reasonable to increase the number of reference points. This raises the 
question as to whether there is an optimal rate of increase. It is also possible 
that the reference points can be better allocated according to an estimated 
density function of the p- values. 

APPENDIX: SOME NUMERICAL ISSUES 

Procedure M. The optimization in Step 1 is computed by dynamic pro- 
gramming. First, remove all tj with r{tj) < (logn)'^. Then, relabel the re- 
maining reference points as si, . . . , Sk so that Un{si) < Un{s2) < • • • < Un{sk)- 
Denote Ij = Ln{sj), Uj = Un{sj). Let Ij = {lj,lj + 1, . . . , Uj}. Step 1 of Pro- 
cedure M requires a solution to 

S = max< ^ : A C {I, . . . ,k} such that /„, a€ A, are disjoint 

as well as the maximizing A. Let So = and for j = 1, . . . ,k, 

Sj = max< ^ \Ia\ : A C {I, . . . ,j} such that la, A, are disjoint > . 

[a€A J 

Then S = Sk and Sj = max{5j(j) -|- \Ij\,Sj-i}, where = max{s : Us < Ij}. 
This recursion leads to the following dynamic programming procedure: 
for j = l,...,k 

if + > Sj-i then: select [j] <— 1, previous[j] <— Sj <— S'j(j) + 
else: select [j] <— 0, previous[j] <— j — 1, <— Sj-i 
A^0, s^k 
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while s > 

if select[s] = 1 then: Au {s}, s <— previous[s] 
else: s <— s — 1 
return A and S = Sk 

Numerical evaluation of andp^. For t-tests, a* is evaluated via (3.4) 
and (3.6). To improve numerical precision, each term in (3.6) can be evalu- 
ated by exp(zfc) with Zk = giiiy + ki)/2)-g{k + l)-g{{iy + l)/2) - k \og{V25), 
where g{x) = logr(x) is evaluated by Igamma in R. For F-tests, is eval- 
uated via (3.4) and (3.8). Each term in (3.8) can be computed by exp(zfc), 
with Zk = k\og{5/2) — b{p/2 + k^v/2) — g{k + 1), where b{x,y) = log B{x,y) 
is evaluated by Ibeta in R. 

To evaluate p^,, the main step is to obtain tx^, = max{n:u/a = F{u)}, 
which can be rapidly approximated by the iteration ui = 1, Un+i = aF{un)- 
Then p^ = F{u^). 
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